-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug: update Zenodo downloader for new API #373
Conversation
Update the Zenodo downloader to work with the new Zenodo API. Use the `filename` key for getting the filenames of all files in a repository. Update the generated download url based on heuristics: the link present in the API reference doesn't work at the moment). Update the method that populates the registry: use the new `filename` key and specify that the checksums are now md5.
@khaeru, I think this is ready. All tests are passing now, although I had to rerun a few because Zenodo had some downtime or returned bad requests while running the tests. Would you mind taking a look at the PR? |
Just tested, it does fix #371! :) |
Hi @santisoler —thanks for the confirmation and quick action here. I will check the code shortly. However, I think it is probably worth getting in touch with Zenodo support and explicitly asking them whether the API change is intentional and permanent (case 1 vs. case 2 in the issue I filed). I'm worried about something like this:
Then you'd have to revert the fix or something like that, and again make a new release. Better to confirm the situation first, right? |
@khaeru This is part of a long term transition plan to the new InvenioRDM system: https://blog.zenodo.org/2022/12/07/2022-12-07-zenodo-on-inveniordm/ that was also announced in a blog post: https://blog.zenodo.org/2023/09/25/2023-09-25-migration-postponed/ |
Thank you @dokempf for sharing the links. It seems that the API used in this PR is documented at: https://inveniordm.docs.cern.ch/reference/rest_api_drafts_records/#download-a-record-file @khaeru, would that be enough? Considering that this API changes is breaking any code using pooch to retrieve zenodo files, I imagine that it will be break many codes and test suites and that it would be appreciated if a fix could be released quickly! Thank you! 😃 |
Hi @dokempf —per the first sentence at #371, I'm well aware of the migration, although you see that I only link to the one blog post and two help pages that were linked by Zenodo themselves from their header banner. Those omit points and details that are included in the two older posts you link. It doesn't seem like Zenodo have provided all the info relevant to the migration in any single place. In particular at https://blog.zenodo.org/2022/12/07/2022-12-07-zenodo-on-inveniordm/ the "FAQs" section only increases confusion:
It seems these do not actually describe what has been done.
As @ericpre notes, this appears to describe the JSON that the API is now serving. However, scroll to the footer of any page at https://zenodo.org, and click the "REST API" link. This appears to be something different; I guess they have just forgotten to update this? Again, I would think it is an easy, no-regret step to simply contact Zenodo and ask them what the current status and plans are. But I'm not a maintainer of pooch, so if you all would like to go ahead based on the current guesses, please do. |
I've just sent a message to Zenodo explaining the issue of the new API breaking backward compatibility and asking if these changes are planned to be permanent or this is just a bug that was created during the migration process. I'm looking forward to their reply and we'll come back to you all with updates as soon as I receive it. For now, I'm holding this PR to be merged until further notice. Thanks @ericpre , @dokempf and @khaeru for joining the conversation! |
Thank you @santisoler, is there an issue that we can track somewhere? |
I sent them the message through their Contact form and left my email so they can reply me. I also invited them join the conversation here or in #371. I noticed some people started opening issues on Zenodo's Github about similar issues with the new API (zenodo/zenodo#2487, zenodo/zenodo#2486). I felt inclined to use their contact form since I don't see they are too responsive on Issues. |
Considering the scope of the contact form, it doesn't give me great confidence because it would see a lot of traffic and the chance that it get lost/slow to reply is not negligible - it does have a bug category, though! I would still suggest to merge and make a release because:
I am to make the case that merging this PR is highly beneficial with minimal because it fixes test suite and people's code -remember that there are no mitigating solution other than changing to other the pooch httpdownloader or stop using pooch, which both would require significantly more work. |
Hi @ericpre. I get you point and it's a priority for us to fix the Zenodo downloader as soon as possible without compromising the user experience or making decisions that will bite us in the future. We noticed that Zenodo published another blogpost today describing some other issues they experienced during the migration: https://blog.zenodo.org/2023/10/19/2023-10-19-upgrade-issues/ Considering this, we just had a discussion in our weekly meeting (you can see the meeting notes) and agreed to wait until Monday 23rd for a reply from Zenodo. If we don't receive a reply by the morning of Tuesday 24th (my morning in PDT) I'll merge a bugfix and make a new release that includes it so we can continue using Pooch for downloading files from Zenodo. Regarding the bugfix, we decided to take another approach. We want the downloader to be able to interact with both APIs (the "legacy" and the "new" one) and automatically decide which one it should use. We basically need to check if the I'll do my best to hit the deadline. I'd love to hear your inputs on this (cc @dokempf @khaeru). |
Hi all, just to say that I agree with @santisoler's approach here. I've use their contact form in the past and they were very good at answering so it's very possible that they'll respond (unless they got swamped with bug reports like this). |
As @leouieda pointed out, Zenodo replies quickly! I received the following reply from Jose:
Considering this reply, I'm sticking with the previous plan of waiting for a reply with further details on the API until next Monday, or release a Pooch version with a bugfix by next Tuesday. |
I've received another reply from Zenodo:
After checking out how the new API is working I've noticed that:
Considering this, I'm planning to merge #375 and make a new release, so we can add support for the new API but still supporting the old one in case Zenodo make changes to fulfill backward compatibility. |
I'm closing this in favor of #375 |
Update the Zenodo downloader to work with the new Zenodo API. Use the
filename
key for getting the filenames of all files in a repository. Update the generated download url based on heuristics: the link present in the API reference doesn't work at the moment). Update the method that populates the registry: use the newfilename
key and specify that the checksums are now md5.Relevant issues/PRs:
Fixes #371